Hungarian Word-Sense Disambiguated Corpus
نویسندگان
چکیده
To create the first Hungarian WSD corpus, 39 suitable word form samples were selected for the purpose of word sense disambiguation. Among others, selection criteria required the given word form to be frequent in Hungarian language usage (frequency rates available in the Hungarian National Corpus (HNC) were used for measurement (Váradi, 2000)), and to have more than one sense considered frequent in usage. HNC and its Heti Világgazdaság (HVG) subcorpus provided the basis for corpus text selection. This way, each sample has a relevant context (the whole HVG article), and information on the lemma, POS-tagging and automatic tokenization is also
منابع مشابه
Manually Annotated Hungarian Corpus
Current paper presents the results of a two-year project during which a consortium of the University of Szeged and the MorphoLogic Ltd. Budapest developed a morpho-syntactically parsed and annotated (disambiguated) corpus for Hungarian. For morpho-syntactic encoding, the Hungarian version of MSD (MorphoSyntactic Description) has been used. The corpus contains texts of five different topic areas...
متن کاملAn Iterative Approach to Word Sense Disambiguation
In this paper, we present an iterative algorithm for Word Sense Disambiguation. It combines two sources of information: Word_Net and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. It differs from other standard approaches in that the disambiguation process is performed in an iterative manner: starting from free text, a set of disambiguat...
متن کاملExploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation
In this paper we present an experiment to automatically generate annotated training corpora for a supervised word sense disambiguation module operating in an English-Hungarian and a Hungarian-English machine translation system. Training examples for the WSD module are produced by annotating ambiguous lexical items in the source language (words having several possible translations) with their pr...
متن کاملAdaptive Word Sense Disambiguation Using Lexical Knowledge in Machine-readable Dictionary
This paper describes a general framework for adaptive conceptual word sense disambiguation. The proposed system begins with knowledge acquisition from machine-readable dictionaries. Central to the approach is the adaptive step that enriches the initial knowledge base with knowledge gleaned from the partial disambiguated text. Once the knowledge base is adjusted to suit the text at hand, it is a...
متن کاملA Naïve Bayes Approach for Word Sense Disambiguation
The word sense disambiguation (WSD) is the task ofautomatically selecting the correct sense given a context and it helps in solving many ambiguity problems inherently existing in all natural languages.Statistical Natural Language Processing (NLP),which is based on probabilistic, stochastic and statistical methods, has been used to solve many NLP problems.The Naive Bayes algorithm which is one o...
متن کامل